Adaptive Step-Size for Policy Gradient Methods

نویسندگان

Matteo Pirotta

Marcello Restelli

Luca Bascetta

چکیده

In the last decade, policy gradient methods have significantly grown in popularity in the reinforcement–learning field. In particular, they have been largely employed in motor control and robotic applications, thanks to their ability to cope with continuous state and action domains and partial observable problems. Policy gradient researches have been mainly focused on the identification of effective gradient directions and the proposal of efficient estimation algorithms. Nonetheless, the performance of policy gradient methods is determined not only by the gradient direction, since convergence properties are strongly influenced by the choice of the step size: small values imply slow convergence rate, while large values may lead to oscillations or even divergence of the policy parameters. Step–size value is usually chosen by hand tuning and still little attention has been paid to its automatic selection. In this paper, we propose to determine the learning rate by maximizing a lower bound to the expected performance gain. Focusing on Gaussian policies, we derive a lower bound that is second–order polynomial of the step size, and we show how a simplified version of such lower bound can be maximized when the gradient is estimated from trajectory samples. The properties of the proposed approach are empirically evaluated in a linear–quadratic regulator problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Step-size Policy Gradients with Average Reward Metric

In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of changes on average reward with respect to the policy parameters. Since the metric directly measures the effects on the average reward, the resulting policy gradient learning employs an adaptive step-size strategy that ...

متن کامل

Adaptive Batch Size for Safe Policy Gradients

PROBLEM • Monotonically improve a parametric gaussian policy πθ in a continuous MDP, avoiding unsafe oscillations in the expected performance J(θ). • Episodic Policy Gradient: – estimate ∇̂θJ(θ) from a batch of N sample trajectories. – θ′ ← θ+Λ∇̂θJ(θ) • Tune step size α and batch size N to limit oscillations. Not trivial: – Λ: trade-off with speed of convergence← adaptive methods. – N : trade-off...

متن کامل

Gradient Methods with Adaptive Step-Sizes

Motivated by the superlinear behavior of the Barzilai-Borwein (BB) method for two-dimensional quadratics, we propose two gradient methods which adaptively choose a small step-size or a large step-size at each iteration. The small step-size is primarily used to induce a favorable descent direction for the next iteration, while the large step-size is primarily used to produce a sufficient reducti...

متن کامل

Off-policy learning based on weighted importance sampling with linear computational complexity

Importance sampling is an essential component of model-free off-policy learning algorithms. Weighted importance sampling (WIS) is generally considered superior to ordinary importance sampling but, when combined with function approximation, it has hitherto required computational complexity that is O(n2) or more in the number of features. In this paper we introduce new off-policy learning algorit...

متن کامل

A stochastic gradient adaptive filter with gradient adaptive step size

This paper presents an adaptive step-size gradient adaptive filter. The step size of the adaptive filter is changed according to a gradient descent algorithm designed to reduce the squared estimation error during each iteration. An approximate analysis of the performance of the adaptive filter when its inputs are zero mean, white, and Gaussian and the set of optimal coefficients are time varyin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Adaptive Step-Size for Policy Gradient Methods

نویسندگان

چکیده

منابع مشابه

Adaptive Step-size Policy Gradients with Average Reward Metric

Adaptive Batch Size for Safe Policy Gradients

Gradient Methods with Adaptive Step-Sizes

Off-policy learning based on weighted importance sampling with linear computational complexity

A stochastic gradient adaptive filter with gradient adaptive step size

عنوان ژورنال:

اشتراک گذاری